Goto

Collaborating Authors

 dipanjan sarkar


Explainable Artificial Intelligence - Demystifying the Hype by Dipanjan Sarkar #ODSC_India

#artificialintelligence

The field of Artificial Intelligence powered by Machine Learning and Deep Learning has gone through some phenomenal changes over the last decade. Starting off as just a pure academic and research-oriented domain, we have seen widespread industry adoption across diverse domains including retail, technology, healthcare, science and many more. More than often, the standard toolbox of machine learning, statistical or deep learning models remain the same. New models do come into existence like Capsule Networks, but industry adoption of the same usually takes several years. Hence, in the industry, the main focus of data science or machine learning is more'applied' rather than theoretical and effective application of these models on the right data to solve complex real-world problems is of paramount importance.


Hands-On Transfer Learning with Python: Implement advanced deep learning and neural network models using TensorFlow and Keras: Dipanjan Sarkar, Raghav Bali, Tamoghna Ghosh: 9781788831307: Amazon.com: Books

#artificialintelligence

Dipanjan (DJ) Sarkar is a Data Scientist at Intel, leveraging data science, machine learning, and deep learning to build large-scale intelligent systems. He holds a master of technology degree with specializations in Data Science and Software Engineering. He has been an analytics practitioner for several years now, specializing in machine learning, NLP, statistical methods, and deep learning. He is passionate about education and also acts as a Data Science Mentor at various organizations like Springboard, helping people learn data science. He is also a key contributor and editor for Towards Data Science, a leading online journal on AI and Data Science.


Text Analytics with Python: A Practitioner's Guide to Natural Language Processing: Dipanjan Sarkar: 9781484243534: Amazon.com: Books

#artificialintelligence

Leverage Natural Language Processing (NLP) in Python and learn how to set up your own robust environment for performing text analytics. The second edition of this book will show you how to use the latest state-of-the-art frameworks in NLP, coupled with Machine Learning and Deep Learning to solve real-world case studies leveraging the power of Python. This edition has gone through a major revamp introducing several major changes and new topics based on the recent trends in NLP. We have a dedicated chapter around Python for NLP covering fundamentals on how to work with strings and text data along with introducing the current state-of-the-art open-source frameworks in NLP. We have a dedicated chapter on feature engineering representation methods for text data including both traditional statistical models and newer deep learning based embedding models.


Named Entity Recognition: A Practitioner's Guide to NLP

#artificialintelligence

In any text document, there are particular terms that represent specific entities that are more informative and have a unique context. These entities are known as named entities, which more specifically refer to terms that represent real-world objects like people, places, organizations, and so on, which are often denoted by proper names. A naive approach could be to find these by looking at the noun phrases in text documents. Named entity recognition (NER), also known as entity chunking/extraction, is a popular technique used in information extraction to identify and segment the named entities and classify or categorize them under various predefined classes. SpaCy has some excellent capabilities for named entity recognition.


Implementing Deep Learning Methods and Feature Engineering for Text Data: FastText

@machinelearnbot

Editor's note: This post is only one part of a far more thorough and in-depth original, found here, which covers much more than what is included here. The FastText model was first introduced by Facebook in 2016 as an extension and supposedly improvement of the vanilla Word2Vec model. Based on the original paper titled'Enriching Word Vectors with Subword Information' by Mikolov et al. which is an excellent read to gain an in-depth understanding of how this model works. Overall, FastText is a framework for learning word representations and also performing robust, fast and accurate text classification. The framework is open-sourced by Facebook on GitHub and claims to have the following.


Implementing Deep Learning Methods and Feature Engineering for Text Data: The GloVe Model

@machinelearnbot

Editor's note: This post is only one part of a far more thorough and in-depth original, found here, which covers much more than what is included here. The GloVe model stands for Global Vectors which is an unsupervised learning model which can be used to obtain dense word vectors similar to Word2Vec. However the technique is different and training is performed on an aggregated global word-word co-occurrence matrix, giving us a vector space with meaningful sub-structures. This method was invented in Stanford by Pennington et al. and I recommend you to read the original paper on GloVe, 'GloVe: Global Vectors for Word Representation' by Pennington et al. which is an excellent read to get some perspective on how this model works. We won't cover the implementation of the model from scratch in too much detail here but if you are interested in the actual code, you can check out the official GloVe page.


Robust Word2Vec Models with Gensim & Applying Word2Vec Features for Machine Learning Tasks

#artificialintelligence

Editor's note: This post is only one part of a far more thorough and in-depth original, found here, which covers much more than what is included here. While our implementations are decent enough, they are not optimized enough to work well on large corpora. The gensim framework, created by Radim Řehůřek consists of a robust, efficient and scalable implementation of the Word2Vec model. We will leverage the same on our Bible corpus. In our workflow, we will tokenize our normalized corpus and then focus on the following four parameters in the Word2Vec model to build it.


Understanding Feature Engineering: Deep Learning Methods for Text Data

#artificialintelligence

Editor's note: This post is only one part of a far more thorough and in-depth original, found here, which covers much more than what is included here. Working with unstructured text data is hard especially when you are trying to build an intelligent system which interprets and understands free flowing natural language just like humans. You need to be able to process and transform noisy, unstructured textual data into some structured, vectorized formats which can be understood by any machine learning algorithm. Principles from Natural Language Processing, Machine Learning or Deep Learning all of which fall under the broad umbrella of Artificial Intelligence are effective tools of the trade. Based on my previous posts, an important point to remember here is that any machine learning algorithm is based on principles of statistics, math and optimization.


Understanding Feature Engineering: Deep Learning Methods for Text Data

@machinelearnbot

Editor's note: This post is only one part of a far more thorough and in-depth original, found here, which covers much more than what is included here. Working with unstructured text data is hard especially when you are trying to build an intelligent system which interprets and understands free flowing natural language just like humans. You need to be able to process and transform noisy, unstructured textual data into some structured, vectorized formats which can be understood by any machine learning algorithm. Principles from Natural Language Processing, Machine Learning or Deep Learning all of which fall under the broad umbrella of Artificial Intelligence are effective tools of the trade. Based on my previous posts, an important point to remember here is that any machine learning algorithm is based on principles of statistics, math and optimization.


R Machine Learning By Example: Raghav Bali, Dipanjan Sarkar: 9781784390846: Amazon.com: Books

@machinelearnbot

If things continue at the current pace, half of India's IT professionals will have published a "data science" book with Packt by 2030. I am getting tired of reviewing new entries in this stream of low-quality copycats, written by people whose only qualification is the ability to read a couple of relevant books - if the reader is lucky, not those from Packt - and get some relevant R code, from those books or from R packages' vignettes. My recommendation is "Introduction to statistical learning" by James, Witten, Hastie and Tibshirani. I am remembering the story with "Learning Data Mining with R" by Makhabel. There were no reviews for months, then I posted mine, two stars.